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ABSTRACT 

One of the biggest problems faced by current and next-generation astronomical surveys 
is trying to produce large numbers of accurate cross identifications across a range 
of wavelength regimes with varying data quality and positional uncertainty. Until 
recently simple spatial "nearest neighbour" associations have been sufficient for most 
applications. However as advances in instrumentation allow more sensitive images to 
be made the rapid increase in the source density has meant that source confusion across 
multiple wavelengths is a serious problem. The field of far-IR and sub-mm astronomy 
has been particularly hampered by such problems. The poor angular resolution of 
current sub-mm and far-IR instruments is such that in a lot of cases there are multiple 
plausible counterparts for each source at other wavelengths. Here we present a new 
automated method of producing associations between sources at different wavelengths 
using a combination of spatial and SED information set in the Bayesian framework 
presented by Budavari & Szalay (2008). Testing of the technique is performed on both 
simulated catalogues of sources from GaLICS and real data from multi-wavelength 
observations of the SXDF. It is found that a single figure of merit, the Bayes factor, 
can be effectively used to describe the confidence in the match. Further applications 
of this technique to future Herschel datasets are discussed. 
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1 INTRODUCTION 

Advances in astronomical instrumentation have simultane- 
ously opened up new wavelength regimes while allowing 
deeper imaging capabilities in old ones. While this has al- 
lowed great advances to be made to our knowledge of the 
high redshift Universe, it has greatly increased the difficulty 
in producing accurate cross identifications between multi- 
wavelength datasets. The underlying causes for this are 
many; pushing to deeper flux sensitivities naturally results in 
a higher source density, while fundamental limitations to the 
angular resolution of imaging across all wavelengths means 
that more of these sources will become confused. This is par- 
ticularly problematic when trying to make associations be- 
tween deep optical/near-IR datasets and equivalently deep 
datasets at other wavelengths such as UV, far-IR/sub-mm 
or X-ray where the angular resolution is typically on the 
order of several to tens of arcseconds. 

One area in which this has been a major stumbling block 
is the exploitation of the first observations in the sub-mm 
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by ground based facilities around 850/im. Because of the 
strong negative k-correction at such wavelengths the bright- 
est sources at 850pm are in fact high redshift (z > 2) star- 
burst galaxies (e.g. Chapman et al. 2005) and hence will be 
optically quite faint. However current single dish sub-mm 
facilities, such as JCMT, APEX or even the IRAM 30m 
telescope, have apertures only in the 10s of metres, which 
results in typical la positional uncertainties of ~ 4 — 6" for 
instruments such as SCUBA at 850pm (e.g. Ivison et al. 
2005). Herein lies the major difficulty in identifying coun- 
terparts to sub-mm sources, the density of sources at their 
predicted flux density in optical and near-to-mid IR bands 
is very high. 

Ideally follow-up interferometric observations in the 
sub-mm with facilities such as the IRAM Plateau de Bure 
Interferometer (PdBI), the Submillimeter Array (SMA) and, 
in the near future, ALMA, could be used to reduce the po- 
sitional uncertainty of sub-mm sources to match that of ac- 
companying optical/near-IR data. However, given the small 
field of view and bandwidth constraints of such facilities, this 
is very observationally expensive and not a feasible option 
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for the large number of sub-mm sources upcoming projects 
such as Scuba-2 and Herschel will produce. 

Typically the approach to finding accurate positions for 
sub-mm sources has been to utilise deep interferometric ra- 
dio observations, utilising the strong correlation between 1.4 
GHz continuum flux and sub-mm flux (Ivison et al. 1998, 
2000; Smail et al. 2000; Ivison et al. 2002, among many oth- 
ers). However this is also observationally expensive, the ratio 
between the sub-mm flux and the 1.4 GHz flux is expected 
to be around 100 at z — 2. Given that the brightest sub-mm 
sources have a flux around lOmJy at 850pm, radio follow-up 
has to be deeper than at least ~ lOOpJy. Even for upcom- 
ing state of the art facilities this is a difficult proposition; 
E-VLA it will take 10-20hrs/sq. deg. to survey these sorts 
of depths. Thus it is clear that large areas of interferometric 
radio data of adequate depth will not be readily available 
for some time. 

On top of this deep radio data counterparts are often 
not found, indeed there is some evidence that the most lu- 
minous sub-mm galaxies are also radio-dim (Younger et al. 
2007; Ivison et al. 2002). 

Identification of counterparts in mid-IR observations 
from Spitzer has been attempted by several authors. Ivi- 
son et al. (2004) and Egami et al. (2004) where amongst 
the first to try and utilise deep Spitzer, in particular MIPS 
24pm, imaging to find counterparts for sub-mm sources. Ivi- 
son et al. identified reliable Q 24pm counterparts for 8 out 
of 9 > 3aMAMBO 1200pm sources. Egami et al. similarly 
found 24pm identifications for 7 out of 10 > 3er SCUBA 
sources, with the remaining three lacking both radio and 
24pm counterparts. Further work paints a similar picture, 
with 24pm counterparts for most sources, but generally only 
those which are also quite strong in the radio. Using deep 
8pm IRAC data Ashby et al. (2006) were able to iden- 
tify reliable counterparts for 17 SCUBA detected SMGs in 
the CUDSS 14 hour field. Of these only 5 had previous 
1.4 GHz radio counterparts from relatively shallow imaging 
(~60pJy) , highlighting the usefulness of shorter wavelength 
identifications in the absence of good radio data. Ivison et al. 
(2007) found statistically significant 24pm counterparts for 
53 out of 120 SCUBA sources from the SHADES survey of 
SXDF and Lockman Hole, however only 11 of these were 
previously undetected in deep radio data. 

In both the mid-IR and radio identifications the "good- 
ness" of an association is determined using the p statistic 
(Lilly et al. 1999; Ivison et al. 2002), which is defined as the 
probability that a radio or mid-IR source of a particular flux 
density could be found by chance at particular distance from 
the sub-mm source. In this way most catalogues of multi- 
wavelength associations to sub-mm sources are constructed 
by taking those with radio/mid-IR associations that have 
p < 0.05, i.e. they have a less than a 5% chance of being 
spurious. This approach is very useful for finding radio asso- 
ciations, as the density of pjy 1.4 GHz radio sources is still 
quite low compared to the typical positional uncertainty in 
the sub-mm. However in the mid-IR similarly faint 8 or 24 
/im sources are numerous enough that only strong or very 
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nearby sources have sufficiently small p statistics to be con- 
sidered as confident associations. 

One thing that is clear is that with the advent of new 
wide areas surveys in the sub-mm using facilities such as 
BLAST (Pascale et al. 2008), SCUBA-2 and Herschel deep 
radio data will not be readily available for the significant 
numbers of SMGs that will be detected. Thus an alternative 
approach which is able to utilise the multi-band data at hand 
are needed. 

Here we present a new technique to narrow down the 
number of potential matches using the Bayesian statistical 
framework found in Budavari & Szalay (2008) . Importantly 
our technique considers both the SED and spatial informa- 
tion in determining which combination of multi-wavelength 
data is associated with a sub-mm/far-IR source. The for- 
malism of this new technique is presented in Section [5] The 
real and simulated datasets utilized in testing this tech- 
nique are described in Section [3] with the results of these 
tests presented in Sections U & [S] respectively. Finally Sec- 
tion [6] discusses the benefits of this technique, while Sec- 
tion l6,5l demonstrates its applicability to upcoming Herschel 
datasets. 



2 CROSS-IDENTIFICATION TECHNIQUE 

Our association technique is broken down into a two step 
process. First spatial matching is performed to find poten- 
tial associations between a sub-mm source and the objects 
in the catalogues at other wavelengths. This process is per- 
formed as per the iterative technique presented in Budavari 
& Szalay (2008; henceforth BS08). The BS08 approach re- 
lies on Bayesian hypothesis testing, where the hypothesis 
under consideration is that n sources from n catalogues at 
different wavelengths originate from the same astronomical 
object. This is compared to the alternative hypothesis that 
the n sources come from n different astronomical objects. 
The Bayes factor is essentially the ratio of the posterior and 
prior probabilities of these two scenerios. The full mathe- 
matical basis for this technique is summarised in Appendix 

El 

One major disadvantage of the BS08 approach is that 
it only considers the as an alternative hypothesis that the 
n sources come from n different physical objects, ignoring 
the likely scenario that some, but not all, of the sources 
from different catalogues are associated. In sub-mm/far- 
IR astronomy we are typically trying to associate one set 
of sources with poor positional uncertainties (i.e. our sub- 
mm source catalogue) to sets of sources with very accurate 
positional information (i.e. optical or interferometric radio 
source catalogues). Thus forming reliable associations be- 
tween the high resolution data is relatively easy and can 
be accomplished with simple nearest neighbour techniques. 
The real challenge is establishing the link between these as- 
sociations and our poorly resolved sub-mm/far-IR sources. 
Given this we take a slightly different philosophical approach 
than BS08. Rather than consider that the n sources come 
from n distinct objects as an alternative hypothesis we con- 
sider the n sources come from 2 distinct objects; the n — 1 
high-resolution sources from one astronomical object and 
the sub-mm/far-IR sources from another. While this does 
not affect the calculation of the Bayes factor for the spatial 
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associations, it is simply the same as considering the inputs 
as two catalogues as opposed to n, it has a fundamental af- 
fect on the way we calculate the Bayes factor for the SED. 
Here we calculate the Bayes factor for the SED by com- 
paring the hypothesis that an object has the measured flux 
(H) to the alternative hypothesis that it actually has a flux 
below the detection limit (K). Mathematically we calculate 
this via the Bayes factor; 

_ / p(n\H)p(g\n, H)dn 
j p(n\K)p(g\n, K)dn 

where n represents the parameterisation (T,z,A v ) of each 
template T, redshift z and extinction A v considered. 
p(g\n, M) is the probability of hypothesis M, p(n\M) is the 
prior probability on M and the integrals run over the range 
of models, redshifts, and dust extinction. 

Here we assume that the likelihood of a SED being "cor- 
rect" is given by the \ 2 distribution via 

p(g\n, M) oc exp{- ^ ^ > 

i=l 9i 

where gi is the observed flux in passband i, fi the model 
flux in i given (T,z,A v ), b the normalisation factor and a gi 
the error on gi. N is the number of observed bands. 

In instances where candidate matches are undetected in 
one or more band the flux limits are introduced as a proxy 
for the measured flux. If the model flux for passband i is 
below the flux limit it is not considered in the sum (i.e. gi — 
bfi(T, z, A v ) = 0). However if the model flux for passband i 
is greater than the flux limit the flux limit is used, with the 
error assumed to be the typical error for sources near the 
limit. 

The SED fitting we perform follows a similar prescrip- 
tion to the photometric redshift estimation technique de- 
scribed in Rowan- Robinson et al. (2008; Henceforth RR08), 
however with several key differences. The subtle difference 
between normal photometric redshift estimation and the 
technique used here is that we are performing Bayesian hy- 
pothesis testing, not parameter estimation. Hence the actual 
best fit parameters (i.e. redshift, template, A„) are not the 
goal here, the aim is to statistically test whether or not a 
particular combination of sources from different catalogues 
is responsible for the flux detected. We utilise a subset of the 
templates from RR08 to fit our candidate matches. For the 
optical to near-IR only the 7 galaxy templates from RR08 
are considered; 2 elliptical, 5 spiral types from Sa to Sdm. In 
the far-IR we consider a Arp220, M82 and Cirrus template, 
again taken from RR08. While more templates, and indeed 
combinations between templates, are typically required to 
produce good SED fits we find that giving the fitting pro- 
cess this additional freedom makes it easier to obtain "good" 
fits (i.e. low \ 2 ) to clearly mismatched sources. 

Redshifts in the range 0-4 are considered, with a step 
size of 0.002 in Log(l + z). Dust extinction in the range < 
A v < 1 is also considered, with the form of the extinction 
as per Calzetti et al. (2000) . 

The fitting process itself is performed via the least 
squares fitting of two components (optical + far-IR tem- 
plates) to the observed fluxes via the use of a non-negative 
least squares fitting algorithm, in this case the Bounded 



Variable Least Squares (BVLS) algorithm (Stark P.B. 
1995)). 

As we wish to demonstrate the general applicability of 
the technique, a minimum level of priors is assumed. Of 
course the selection of the SED templates is in itself a very 
strong prior, however as the templates used in this work have 
been shown by RR08 to match a large fraction of the SWIRE 
galaxy population it is reasonable to believe that they repre- 
sent a fair sampling of the underlying galaxy population. In 
addition to ensure that we are not assigning statistical signif- 
icance to implausible solutions a luminosity prior is included 
in the same fashion as RR08, i.e. —17 — z >Mb > 22.5 — z 
for 2 < 2 and -19.5 >M S > -25 for z > 2. 

In practice it is impractical to consider every combina- 
tion of every source in the input catalogues. Thus at each 
step combinations of sources which are greater than some 
arbitrary search radius (typically 3-5 times the assumed po- 
sitional error) can be excluded from the calculations. After 
a list of candidate spatial matches has been compiled using 
the BS08 formalism SED fitting is performed on each to try 
and find the best match to the source. 

Thus the algorithm for determining the final association 
is as follows; 

(i) calculate In B 3paiia ; and In B se[ j for each association 
within an arbitrary radius. 

(ii) Add together In B spatiai and In B sed to give the final 
Bayesian evidence, In B tot . 

(iii) Find the largest value of In B to t out of the potential 
matches. This is the final association. 



3 DATA 

To demonstrate the effectiveness of our technique we make 
use of both simulated data from GaLICS (Hatton et al. 
2003) and real observations of the Subaru-XMM Deep Field 
(SXDF) from ground and space based facilities . 

The GaLICS simulations are well-suited to our purposes 
as they incorporate realistic clustering, realistic star forma- 
tion histories and galaxy properties, and simulated SEDs 
which cover a wavelength range from UV through to sub- 
mm. 

The real data focuses on the SHADES SCUBA observa- 
tions in the Subaru XMM-newton Deep Field (SXDF). This 
dataset has the advantage of having accompanying ancillary 
data at optical, IR and radio wavelengths allowing high pre- 
cision multi-wavelength associations to be made for a large 
number of the SCUBA sources (Ivison et al. 2007; Clements 
et al. 2008). This allows us to test out optical to far-IR iden- 
tifications against a "truth" list of radio identifications. 

The SHADES survey performed 850/im observations 
with SCUBA on the James-Clark Maxwell Telescope 
(JCMT) of a 0.2 sq. deg pointing coincident with the Subaru 
XMM-newton Deep Field (SXDF). Details of the observa- 
tions and resulting maps and catalogues can be found in 
Coppin et al. (2006). 

For the SHADES SCUBA associations we utilise data 
from a number of deep surveys in the SXDF region. In the 
optical we utilise public DR1 release of the Subaru XMM- 
newton Deep Field Survey (Furusawa et al. 2008). SXDS 
observed 5 fields in a 'plus' shaped pattern centered on 
Right Ascension=02 h 18 m 00 s and Declination=-05°00'00" 



4 I.G.Roseboom 

with the SuprimeCam instrument on the Subaru telescope. 
The SHADES field is wholly contained within the single cen- 
tral SuprimeCam pointing. Observations were performed in 

5 optical bands, B, V, R, i' , z' , with 3a, 2" aperture, AB 
mag depths of 27.5, 27.5, 27, 27, and 26 respectively. 

In the near-IR we utilise data from the UKIDSS 
(UKIRT Infrared Deep Sky Survey; Lawrence et al. 2007) 
Ultra Deep Survey (UDS). UKIDSS uses the UKIRT Wide 
Field Camera (WFCAM; Casali et al, 2007) and a photo- 
metric system described in Hewett et al (2006). The pipeline 
processing and science archive are described in Irwin et al 
(2008, in prep) and Hambly et al (2008). We utilise the DR3 
release of the UDS dataset, which contains photometry in 
J,H and K to a 5a depth of 23.7, 23.5, and 23.7 AB mags 
respectively. The UDS field is coincident with the SXDF, 
covering the extent of the SHADES SCUBA observations. 

For the mid-far IR we utilise data from the SWIRE 
survey. SWIRE contains imaging of the entire XMM-LSS 
field in both the IRAC and MIPS instruments on SWIRE. 
This results in a 5 band dataset, with flux measurements 
centred on 3.6, 4.5, 5.8, 8.0, 24.0 /mi. While the MIPS 70/im 
and 160/im data is included in the analysis only a very small 
number of SHADES sources are found to have nearby 70 
and/or 160/im sources in SWIRE and thus it is of little use 
in the vast majority of cases. 



4 TESTING ON SIMULATED CATALOGUES: 
GALICS 

As an initial test we try to recreate in simulated data the 
real scenario that will be considered later in this paper the 
matching of 850/im sources to deep optical and Spitzer IRAC 
& MIPS data with no redshift information. The use of the 
simulations at this stage is vital as it offers the convienence 
of a perfect truth list to test against, i.e. we know the true 
underlying association for each object apriori, something 
which is never truly possible with real data. 

We select 1 cone (Cone 1: 1 sq. deg.) of GaLICS sim- 
ulations with photometry in 5 optical bands (B,V,R,i',z'), 
3 Near-IR bands (J,H,K), the 4 Spitzer IRAC bands, and 
the MIPS 24 k, 70/xm bands. The simulated data is then 
broken up into three catalogues, an optical-near IR cat- 
alogue, a "SWIRE" Spitzer IRAC+MIPS 24/xm catalogue 
and a SCUBA 850/im catalogue. Flux limits are introduced 
to make these catalogues resemble those found in the SXDF. 
All objects with B < 27, S3. 6 > 10/xJy and Ssso >2 mjy are 
kept in the catalogues. Additional limits are placed on the 
fluxes in the "SWIRE" catalogue to take into account the 
varying sensitivity between the IRAC channels and MIPS 24 
fim. Catalogued objects with flux values less than 40/xJy at 
IRAC 5.8/mi & 8.0/im, or < 50/iJy at 24/mi are treated as 
undetected at these wavelength in the analysis. While these 
limits are somewhat lower than the real data in SXDF they 
better match the observed number density of objects in each 
catalogue. This mismatch is a result of a natural disparity 
between the number density of far-IR luminous sources in 
the GaLICS simulations compared to the real Universe. 

These cuts result in catalogues of 253 SCUBA 850/im 
sources, 34932 Spitzer sources (10817 with 24/im), and 
306842 optical+near-ir sources, respectively. 

As the flux limits for each catalogue are imposed in 



Table 1. Summary of completeness (C) and reliability (R) of 
matching between simulated optical, Spitzer and SCUBA band 
catalogues, where we require that the association has a measured 
flux at 24/im. 





Total 


Correct 


C 


R 


P24 < 0.05 


106 


92 


79% 


86% 


Bayesian Matching (no cut) 


1S1 


115 


98% 


64% 


Bayesian Matching (In B > 5) 


89 


85 


73% 


96% 


Bayesian Matching (InB > 2.2) 


106 


96 


83% 


90% 



different wavelength regimes there is a natural disparity be- 
tween the catalogues, this is reflected by the fact that not 
all of the sources in one catalogue have matches in the other 
two. Specifically, only 129 of the 253 mock SCUBA sources 
have corresponding entries in the Spitzer catalogue. Of these 
117 are "detected" at 24 /im (i.e. model S24,im > 50/iJy). 

The positions of objects in the three catalogues are inde- 
pendently scattered by Gaussian random errors, with the po- 
sitional uncertainty in each case being: Optical 0.1", Spitzer 
0.2" and SCUBA 850pm 3". 

For our first test we try and find associations between 
our three catalogues requiring that a 24/im detection is 
present in the Spitzer catalogue (i.e. S24nm > 50/iJy). For 
comparison we also find the best 24/im association for each 
mock SCUBA source using the p-statistic (Downes et al. 
1986). Table [T] summarises the our results in terms of com- 
pleteness (total number of correct matches over all true as- 
sociations) and reliability (number of correct matches over 
total made) . Like the p-statistic our approach relies on a sin- 
gle statistic to determine the believability of an association; 
the Bayesian evidence InB. Three InB selection thresholds 
are presented; none, InB > 5, and InB > 2.2. The In B > 5 
selection is consistent with "strong evidence" for a match 
according to the Jeffery's' scale Jeffreys (1961). A final se- 
lection (ln_B > 2.2) which matches the number of associa- 
tions found via the p < 0.05 selection is also shown. This 
is to enable a fair and direct comparison between the two 
methods. 

From Table Q] it can be seen that the Bayesian analysis 
performs similarly to the p-statistic in correctly associat- 
ing sub-mm to shorter wavelength counterparts, with both 
achieving a completeness of around 80% and a reliability of 
~90%. One advantage of the Bayesian approach over the 
p-statistic is its ability to make correct identifications at 
the highest redshifts. This can be seen in Figure [T] which 
shows the redshift distribution for the correct Bayesian and 
p-statistic matches. The Bayesian approach correctly recov- 
ers 38/42 associations at z > 1.5 while the p-statistic only 
recovers 20/42. The reason for this is clear, the 24/im flux 
of these sources drops dramatically as a function of redshift, 
whereas the 850/im flux, as a result of the negative k correc- 
tion, stays the relatively constant. Thus the 24/im associa- 
tions for high- z SCUBA sources will always be very faint, 
and hence have a high number density. As the p-statistic 
is based on the number density of sources at or above a 
given flux level it will often determine that faint, higher- z , 
counterparts have a high chance of being spurious. 

However Figure [T] does show one failure of the Bayesian 
approach, which is a decrement of correct associations 
around z — 1 — 1.4. This can be attributed to the fact that 
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Redshift 

Figure 1. Redshift distribution of correctly recovered associa- 
tions using both our Bayesian approach (solid line) and the p- 
statistic (dot-dashed line). The Bayesian approach appears to re- 
cover more high-z (z > 1.5) associations, while the p-statistic 
is more effective at low to intermediate redshift. This can be at- 
tributed to the different philosophy of each approach, as explained 
in the text. 



Table 2. Summary of completeness (C) and reliability (R) of 
matching between simulated optical, Spitzer and SCUBA band 
catalogues, where we do not require that the association has a 
measured flux at 24/xm. 



Base priors 


Total 


Correct 


C 


R 


P24 or P3.6 < 0.05 


109 


93 


72% 


85% 


Bayesian Matching (no cut) 


225 


115 


89% 


51% 


Bayesian Matching (In B > 5) 


130 


93 


72% 


72% 


Bayesian Matching (In B > 8) 


109 


85 


66% 


78% 


Redshift and extra Mg prior 


Bayesian Matching (no cut) 


225 


117 


91% 


52% 


Bayesian Matching (In B > 5) 


110 


94 


73% 


85% 


Bayesian Matching (In B > 5.2) 


109 


93 


72% 


85% 



the GaLICS simulated SEDs do contain as strong absorption 
in the silicate feature at rest frame 9.7pm as found in the 
RR08 templates. As this enters the 24pm band at z ~ 1 the 
"observed" 24pm flux is always found to be much greater 
than would be expected from the RR08 templates. This re- 
sults in poor fits to the available templates, in turn causing 
associations whose true redshift is in this range to be given 
low evidence. 

While associations with 24pm sources are a reasonably 
reliable "gateway" to associations at shorter wavelengths un- 
fortunately not all sub-mm sources will have detections at 
24pm. Thus we would like to be able to find reliable asso- 
ciations for these sources on the basis of Spitzer IRAC and 
optical/near-IR data alone. To determine if our approach 
can successfully produce associations in the absence of 24pm 
detections we repeat the analysis above, but this time al- 
low sources from the Spitzer catalogue without 24^tm to be 
considered as potential counterparts. The completeness and 
reliability of these associations is given in Table [2] 

By allowing associations between sub-mm and IRAC 



only (i.e. 3.6pm) sources the reliability of our associations 
drops considerably. For comparison the p-statistic is also 
computed for the 3.6pm sources, with the final p-statistic 
determined association the better of the 3.6pm or 24pm as- 
sociations, where available. 

Under these circumstances our approach results in con- 
siderably worse completeness and reliability than the p- 
statistic for the same number of objects selected. This is 
not unexpected, the number of 850pm sources in the simu- 
lations without accompanying 24pm detections is very small 
(12/129), while the number of Spitzer catalogue objects 
without 24pm is much higher (24115/34932). 

Encouragingly we correctly identify all 12 850pm 
sources without 24pm counterparts with strong evidence 
(InB > 9), while the p-statistic only recovers one of these. 

Investigating the properties of the incorrect associa- 
tions made by our technique immediately reveals the rea- 
son for such poor performance in this scenario. Figure [2] 
shows the Mb vs. redshift for the InB > 5 associations. 
The vast majority of the mismatches are located in the red- 
shift range z = 1.2 — 1.6, with systematically lower opti- 
cal luminosities. This failure of our approach can be at- 
tributed the erroneous treatment of the 9.7pm silicate ab- 
sorption feature in the model SEDs. While in the case of the 
24pm only associations this hampered our ability to make 
associations with strong evidence, here 3.6/xm associations 
with strong evidence are able to be made made as the pre- 
dicted 24pm flux from the templates is much lower, and 
hence much closer to (or below) the S24nm > 50/iJy flux 
limit. Knowing this we can introduce a prior based on the 
Mb~z evolution such that these low- luminosity, intermedi- 
ate redshift solutions are strongly disfavoured. The prior 
introduced is defined as p = 1 — (Mb + 19-5 + 0.9 * z) 
for -0.9 * z - 19.5 < Mb < -0.9 * z - 18.7 & z > 0.3, 
p = 1 for M B < -0.9 * z - 19.5 or z < 0.3 and p = for 
Mb > —0.9*^ — 18.7 & z > 0.3. In a further effort to recover 
the true associations we also introduce the true redshift dis- 
tribution of the 850pm sources as a prior, where the redshift 
distribution of sources is modelled as a Gaussian with mean 
z — 2.24 and a — 0.945. Again this prior is only invoked at 
z > 0.3 so as not to affect our ability to recover the small 
number of low luminosity 850^tm sources at low redshift. 
The result of repeating the analysis with the introduction of 
these priors is given in Table [2] 

As with the 24pm only associations we succeed in doing 
as well as a p-statistic based analysis, with the major benefit 
being that larger number of high-z associations made. In this 
scenario 46/53 z > 1.5 850pm sources are correctly identified 
with strong evidence, while only 24/53 are recovered by the 
p-statistic. 



5 TESTING ON SHADES SCUBA SOURCES 

To test the effectiveness of our association technique on real 
data we try to reproduce the optical-mid-IR associations 
of SHADES SCUBA sources with confident radio IDs in 
SXDF compiled by Ivison et al. (2007; hereafter 107) and 
subsequently analysed to produce photometric redshifts and 
SED fits by Clements et al. (2008; hereafter C08). Here we 
propose a simple test for the application of our association 
algorithm; produce associations with only the optical and in- 
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Redshift 

Figure 2. Ms vs. redshift for correct (astericks) and incorrect 
(crosses) associations with strong evidence (InB > 5). The vast 
majority of incorrect associations are found in the redshift range 
1.2 < z < 1.6 with low Mfl. The parameter space between the 
dotted lines represents the region which we attempt to disfavour 
in the SED fitting via the introduction of the prior discussed in 
the text. 



fxared datasets and see if these coincide with the 33 SMGs 
with confident radio associations presented in 107 & C08. 
Table [3] compares the associations and best-fit photomet- 
ric redshifts from 107 & C08 to those determined using our 
Bayesian approach. 

Several important caveats must be introduced before 
the matching can be performed. Firstly only SWIRE cata- 
logue objects with detections in 2 or more bands are con- 
sidered. This restriction is imposed as a faint single band 
SWIRE detections is highly likely to be spurious, and also 
easily fit by any far-IR SED. As we are now dealing with 
real data the redshift and the extra M& priors introduced 
in the previous section should now longer be necessary as 
these were introduced to overcome deficiencies in the model 
SEDs. Thus the only priors included are the Ms prior from 
RR08 to ensure plausible luminosities, and the implicit prior 
that the SEDs are well approximated by our limited range 
of templates. In addition as we expect the SED fitting to 
be more successful on the real data we consider all matches, 
not just those with 24pim IDs. While any associations made 
without the benefit of a 24pim detection should be treated 
with great care, it will be very interesting to see if our tech- 
nique can be successful in making reliable associations with 
IRAC & near-ir/optical data alone. 

To account for the limited range of templates and sys- 
tematic differences between these and the data, a minimum 
error is enforced on all of the flux densities. For the optical 
and UKIDSS near-IR data a minimum error of 0.05 mags is 
enforced. For the IRAC data a minimum error of 5% error is 
used, while for the MIPS & SCUBA data a minimum error 
of 10% is used. In addition only optical/IR sources within 
a 15" radius of the SCUBA source are considered. While in 
principle this technique should consider sources at all sepa- 
rations, for practical reasons, i.e. the limitation of computer 
power, it is necessary to impose a maximum search radius. 
This radius was chosen as it is well matched to the beam of 



the SCUBA instrument at 850^im and safely encompasses 
all the associations presented in C08. 

As can be seen from Table [3] 20 of the 33 confident 
I07/C08 ids are reproduced exactly using our technique, 4 
have different associations, but are among the multiple as- 
sociations I07/C08 considered plausibl^fl, while 9 are totally 
inconsistent with I07/C08. 

Of the 9 discrepant associations three (SXDF850.1, 
SXDF850.10, SXDF850.12) are only detected in a single 
Spitzer band in SWIRE and hence cannot be recovered here. 
The alternative associations that we make for SXDF850.1 
and SXDF850.12 are at separations greater 10" and have rel- 
atively weak In Bt ot . SXDF850.10 is a nightmare scenario for 
the technique as it has a spurious counterpart very nearby 
(1.1") with a photo- z in the 1.1 < z < 1.4 redshift range 
where 24/im is expected to be weak/non-detected due to 
silicate absorption. 

The other six are interesting cases; SXDF850.3 has 
a very unusual SED with relatively bright IRAC 3.6 and 
4.5/mi flux but nothing in the other IRAC bands or MIPS 
24^im despite the best fit template SED predicting the fluxes 
in these bands to be well above the detection limit. This was 
also noted by C08 and investigations into the origin of this 
unusual SED shape, whether it be real or a result of prob- 
lems with the data, will be presented in future work. As a 
result here an alternative match at large separation is found, 
however this has very weak evidence (In B to t = —5.1). 

SXDF850.29 is also a problematic association for C08. 
Both of the bright nearby optical sources are at low redshift 
(z » 0.18) and do not have SEDs suggestive of strong 850/im 
emission. They suggest that this could in fact be a lensed 
system, with the sub-mm flux coming from a background 
galaxy which may appear as a small amount of extended 
3.6/im from one of the galaxies. Here we associate it with 
the nearby bright SWIRE source (2902 from C08), but with 
very weak evidence. While this indeed be a lensed system 
complicated situations such as this are unlikely to be recov- 
ered by our technique. 

SXDF850.35 is a very interesting case. The SWIRE data 
contains detections in all 4 IRAC bands and MIPS 24/im 
which provides quite a good fit to the M82 template. How- 
ever at the determined photo-z the best fit template does not 
contain enough 850/im flux to be detected by SCUBA. Thus 
this match is given a very weak evidence (In B se d = —103). 
We make an alternative association with a slightly more dis- 
tant source, but again with very weak evidence, suggesting 
it is not a plausible alternative. Either the C08/I07 associa- 
tion is wrong here, or this is a clear case of the small number 
of SED templates used here not being sufficient. 

SXDF850.69 is designated as a less secure association by 
C08 as the position of the associated radio source is 13" away 
from the SCUBA position. We find a weak (In Btot = 5.3) 
alternative association here. 

SXDF850.74 is also designated as less secure by C08. 
This is another relatively nearby (z p h ot — 0.7) optical galaxy 



2 Here and throughout we define a plausible association to be 
one which has either a reasonable probability of being spurious 
(> 5%), or where it is not possible to discriminate between two 
or more possible associations 
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Table 3. Comparison of Scuba radio associations presented by Ivison et al. (2007) & Clements et al. (2008:C08) to those determined 
here. Column 2 details whether or not the association from 107 is reproduced; Y for yes, N for no and an incorrect association has been 
made. Where the incorrect, or an alternative, association has been made the values for the C08 association(s) are shown (when possible). 
Scpcration quoted is distance from best-guess position to 850/^m source in arcseconds. 
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1 This association is designated as less reliable in the C08. 

2 Here the best ID includes a MIPS 70/im & 160/im association. 

3 Here our association agrees with C08, however they note that there are multiple plausible associations, which appear to be at the 
same redshift so are probably associated. In each case we present data for the alternative associations below our match. 

4 Same as 1, but here we choose the one of the other associations. 

5 The quoted position of this source in Table 1 of C08 is wrong. 
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with a very faint associated SWIRE source. Again we find 
a weak (In Btot = 6.1) alternative association. 

SXDF850.88 is another less secure association from 
C08. Here we make an association with a closer, higher z 
object, but again with weak evidence (In Btot = 6.1). 

A good example of a difficult, but correctly made, as- 
sociation is SXDF850.6. This source is one of the most con- 
fused scenarios and the only one in our sample to have a 
definitive sub-mm position from interferometric sub-mm ob- 
servations with the SMA (Iono et al. in prep) . Figure|3]shows 
the postage stamp image for SXDF850.6 and the first 6 best 
fit SEDs for each possible optical+SWIRE source. 

Clearly SXDF850.6 is one of the most difficult cases in 
the sample for cross-identification. The true sub-mm source 
is also one of the most distant, with several other sources 
closer to the sub-mm position. From the values in Figure [3] 
it is clear that simply using the \ 2 statistic would not be suf- 
ficient in this case; the lowest \ 2 is given by the fourth closest 
which has In B se( j = —3.43, while the true ID (#6) has worse 
X 2 , but significantly greater In H ae d- This demonstrates the 
power of using the Bayesian evidence, which takes into ac- 
count both the likelihood of an association being the correct 
match, with the observed sub-mm flux and the likelihood of 
an association being the incorrect match, with undetected 
sub-mm flux. 

While the results of our approach on individual sources 
are informative, it is worth considering the completeness 
and reliability statistics as presented in Section [3] For no 
cut on evidence we recover 24 of the 33 associations pre- 
sented in I07/C08. For a reasonable evidence threshold, i.e. 
In Btot > 8 we recover 20 I07/C08 associations with one 
discrepant (SXDF850.10), translating to a 64% complete- 
ness rate, with 95% reliability. However it is possible, if not 
likely, that some of the associations presented in I07/C08 are 
not correct. In fact C08 go so far as to indicate which associ- 
ations they are not confident in; SXDF850.14, SXDF850.24, 
SXDF850.69, SXDF850.74 & SXDF850.88. Of these we 
only recover one with reasonable evidence (SXDF850.14). 
If we exclude these associations from our I07/C08 "truth" 
list then our completeness improves to 72%. Encouragingly 
these completeness and reliability rates are very close to 
those predicted from simulations in the previous section. 

A comparison of the photo- z estimates between C08 and 
here is given in Table [3] There is some level of agreement 
with the C08 photo-z measurements, although in a few cases 
the redshifts are clearly discrepant. This is more clearly seen 
in Figure|31 where the distribution of both sets of photo-z es- 
timates is shown. Also shown is the redshift distribution for 
spectroscopically confirmed SCUBA galaxies from Chapman 
et al. (2005). The median redshift for associations presented 
here is z — 1.73, slightly higher than the C08 measure of 
the same sample (z = 1.44) and significantly lower than the 
Chapman et al. sample which has a median of z = 2.5. 

While the disagreement between the C08 and our photo- 
z measures is troubling, this is to be expected as although 
the templates are similar, the photometric data and fitting 
algorithm are subtly different. In particular our inclusion of 
the UKIDSS near-IR data seems to have a significant effect 
on the photo-z estimates. In 5 of the cases where our pho- 
tometric redshift is much different than C08 (SXDF850.5, 
SXDF850.il, SXDF850.23, SXDF850.30 and SXDF850.71) 
we find that the reason for the discrepancy is that the off- 
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Figure 3. Postage stamp image of SHADES source SXDF850.6 
and its surrounding area (top) and best-fit SEDs for each poten- 
tial counterpart (bottom). Background is SXDS V-band image. 
Diamonds represent the SWIRE sources in the area. The circle 
represents the 6" positional accuracy of SCUBA. For each SED; 
the solid line is the overall best fit optical and IR, SED, the dot- 
ted line is the optical SED, while the dot-dashed line is the IR 
SED. The measured In B setJ , best-fit \ 2 and photo-z are also 
given. Where a source is undetected in the Spitzer bands the up- 
per limit is shown. Subsequent SMA observations show that all 
the 850/^m flux comes from ID #6 (Iono et al. in prep). 
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Redshift 

Figure 4. Comparison of photometric redshift estimates for 
SCUBA sources from C08 (solid line), and the this work (solid 
grey). Also shown for comparison is the spectroscopic study of 
Chapman et al. 2005 (Dashed line). The median redshift of the 
photometric redshift samples is slightly less, but still reasonably 
consistent with the spectroscopic sample. 



set from the SWIRE position and SXDS optical position is 
quite large (> 1.5") and our fit has been made using a closer 
UKIDSS source. Thus only the UKIDSS near-IR and Spitzer 
data has been used to constrain the photometric redshifts in 
these cases. In all but one of these cases (SXDF850.il) our 
photo- z is much higher than the C08 estimate. Considering 
that these objects are possibly drop-outs in the deep SXDS 
V band selected catalogue (V < 27.2) high redshifts [z > 3) 
should be expected. 

While the median photometric redshift for 850/im 
sources in C08 is only slightly lower than that found here 
it is clear from Figure [4] that our median redshift is inflated 
by a small number of objects at z > 3 and in fact the dis- 
tribution is missing the large peak of galaxies at z ~ 2.5 
which appear in both the C08 and Chapman et al. analy- 
sis. There are several reasons to expect this discrepancy. As 
pointed out in C08 the spectroscopic redshift desert around 
z ~ 1.5 slightly biases the Chapman et al. sample to higher 
redshifts. However the greatest effect comes from requiring 
a SWIRE and optical counterpart to each SCUBA source. 
This inherently means that higher-z sources which are too 
faint to be found in the SWIRE or SXDS catalogues natu- 
rally fall out of our sample. While this was less of a problem 
in C08, as they utilise the accompanying deep radio associ- 
ations, here we rely on the quality of the SWIRE data to 
decide the correct association, in particular the 24/mi data 
which does not have a large number of sources at z > 2. 
Thus with the data at hand it is only possible to associate 
the low-z and/or high luminosity 850/im sources. 



6 DISCUSSION 

6.1 Associations for other previously unassociated 
SHADES SCUBA sources 

While 107 present confident radio ids for 33 SHADES 
SCUBA sources in the SXDF region, a total of 60 > 3.5<7 



850/im sources were identified. While some of these have 
tenuous radio and/or mid-IR associations presented by 107, 
these associations are either confused or have a high chance 
of being spurious as determined by the p-statistic. However 
as discussed in Section [4] the p-statistic can often be too 
harsh on sources at high redshifts where the expected coun- 
terpart would be faint. 

Thus in addition to the subset of sources with confident 
IDs we have also run our association algorithm on the full 
SHADES 850/im catalogue in SXDF. Applying the same In 
Btot > 8 cut used above we present 4 new plausible associa- 
tions for SHADES SCUBA sources without confident radio 
IDs. The details of these associations are given in Table I6TT1 

The remaining 23 SCUBA sources are again left without 
optical-IR associations. It is likely that the bulk of these 
sources are simply too faint in the Spitzer IRAC & MIPS 
24/im bands to be found in the SWIRE survey. 

6.2 Mid-IR Properties of SHADES SMG's 

One of the distinct benefits of our association technique is 
that in addition to identifying the correct counterpart the 
best-fit photometric redshift and SED is also produced as 
a by-product. This allows the optical and far-IR luminosi- 
ties to be easily investigated. Here we present some of these 
derived properties as a "sanity" check that our SED fitting 
technique is behaving as it should. 

Here we only include sources for which we are confident 
of the association via the In Btot > 8 cut. This leaves a 
sample of 25 sources, the 21 from our analysis of the con- 
fident C08 sample, and the 4 new associations made in the 
previous section. The one known incorrect association with 
In Btot > 8 (SXDF850.10) is included for the sake of fair- 
ness. Figure [S] compares the integrated far-IR luminosity (8- 
1100/im) to both redshift and 850/im flux. Encouragingly 
the far-IR luminosities measured here are predominately in 
the range of lO n L<0 to 1O 13 ' 6 L0, consistent with both 107 
and previous work on sub-mm galaxies (Ivison et al. 2002; 
Pope et al. 2006; Chapman et al. 2005). 

Strong correlations are found between the far-IR lumi- 
nosity and both redshift and 850/im flux. This is not an un- 
precedented result; Both Ivison et al. (2002) & Pope et al. 
(2006) found similar trends in smaller samples of SMGs with 
photometric redshifts. Here we again conclude that the cor- 
relation with redshift of the far-IR luminosities is a result of 
evolution in both the number density and SED properties 
of ULIRG-like galaxies. 

Encouragingly our 4 new associations have luminosities 
and redshifts consistent with the rest of the SMC sample. 
However this does make their radio- weak nature, all 4 are 
undetected in deep VLA radio imaging, somewhat myste- 
rious. To further examine whether we expect these sources 
to be radio- weak the 850/im- 1.4 GHz flux ratio-to-redshift 
correlation is shown in Figure [6] 

5 SMGs in our combined sample have no accompany- 
ing radio detection, the 4 new associations (SXDF850.32, 
SXDF850.56, SXDF850.65, SXDF850.70) and SXDF850.71. 
Of these 6, 3 have upper limits on the 850/mi/1.4 GHz 
flux ratio which are roughly consistent with both the Chap- 
man et al. relation and the A220 model (SXDF850.50, 
SXDF850.56, SXDF850.71). The other 2 all have upper lim- 
its which are higher than both predictions, i.e. they should 



SXDF ID Position 



Zphot In B tot In B Jti x 2 F 3 6 F4.5 F 5 . 8 Fg.o F 24 F 850 B V R i' z' J H K Separation( ") 



Confident 



32 


34.345 


-5.0126 


2.30 


10.69 


-0.83 


8.36 








185.6 


6050 


27.33 


26.81 


26.53 


26.20 


25.71 


24.31 


24.31 


23.47 


7.6 




56 


34.446 


-5.1081 


3.67 


42.10 




268.2 


16.97 


16.84 






166.3 


3650 


29.01 


28.02 


26.72 


26.26 


25.48 








4.9 


65 


34.532 


-5.0680 


0.71 


9.57 


-3.2 


113.6 


6.24 


3.9 








4350 


26.97 


25.94 


25.24 


24.14 


23.59 


23.11 


22.84 


22.42 


5.6 


70 


34.547 


-5.0468 


1.48 


10.62 


-3.2 


23.2 


15.94 


14.78 








4050 


25.98 


25.79 


25.36 


24.84 


24.29 


23.01 


22.38 


21.92 


2.3 



■'■Association disagrees with a weak (< 40//Jy) radio id from Ivison et al. (2007). 
2 Two plausible associations are found so both arc quoted. 
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Figure 5. Far-IR luminosity vs. redshift (left panel) and 850/^m 
flux (right panel) for both the 107 SMG sample (open symbols) 
and the 5 new associations made here (solid symbols). SMGs 
are also broken down by dominate far-IR SED; Circles represent 
far-IR SEDs best-fit by the A220 template, Squares the M82 tem- 
plate, Stars the cirrus template. 



have been detected given the depth of the radio observations. 
While this would appear to be a strong argument against 
the plausibility of these associations a large number of the 
I07/C08 sample of SMGs are also found to have 850/im/1.4 
GHz flux ratios much higher than expected. While this may 
simply be a result of the large errors on both the SCUBA 
850/im and 1.4 GHz radio fluxes and possibly erroneous 
photo-z's, there are still a significant number of discrepant 
SMGs even when errors are taken into account. 6 SMGs 
from the I07/C08 sample are found to have discrepantly high 
850/im/1.4 GHz flux ratios here. Of these one is the incor- 
rect association SXDF850.10. Another three are cases where 
we have found a photometric redshift much less than C08 
(SXDF850.8, SXDF850.il, SXDF850.37). In these cases the 
flux ratio would not be discrepant if the SMG is actually at 
the C08 photo-z estimate rather than the one made here. 
This leaves two cases (SXDF850.96, SXDF850.119) in which 
the 850/im/1.4 GHz flux ratio is inexplicably discrepant. In- 
terestingly in these caess the p-statistic for the radio ID 
which is significant (0.039 & 0.043, respectively). However 
both have fairly significant evidence (In Btot = 9.1 & 10.0 
respectively. So while we hesitate to further downgrade the 
status of these associations, this exercise again demonstrates 
the diagnostic power of the radio data to discriminate be- 
tween plausible associations. Additionally this also demon- 
strates the need for good quality redshifts, whether they be 
spectroscopic or, more practically, well calibrated photomet- 
ric estimates. 




Redshift 

Figure 6. 850^m to 1.4 GHz flux ratio vs. redshift for both the 
I07/C08 sample and our 4 new associations. The solid line in 
the left panel represents the empirical relation of Chapman et 
al. (2005), while the dotted line represents an Arp220 template 
(Carilli & Yun 2000). 



6.3 Associations of SMGs in the Radio vs. the 
Mid-IR 

It is clear from the discussion above that deep interferomet- 
ric radio images remain the most effective way to identify 
counterparts to sub-mm galaxies. Of the 25 IDs we are able 
to present with some certainty, only 5 are without radio 
counterparts. 

When considering the practicality of using mid-IR data 
to identify distant sub-mm sources it is worth noting the 
expected ratio between the sub-mm flux and those in the 
IRAC and MIPS 24/^m bands is significantly greater than 
the sub-mm to radio flux ratio. This is emphasised in Fig- 
ure [7] Shown are various expected flux ratios for an Arp220 
template in the Spitzer IRAC and MIPS bands, and also a 
prediction of the 1.4 GHz radio from Carilli & Yun (2000). 
It is clear that for a typical SCUBA 850/xm source with 
S850m"» ~ 5mJy, and an Arp220 like SED, at z ~ 2.5 we 
would expect to detect it in the mid-IR at 24/xm at ~ 50/iJy 
and at 8/im at ~ 1 — 2pJy, while in the near-IR the 3.6/iJy 
flux would again be ~ 1/iJy. These values are approximately 
an order of magnitude fainter than the nominal detection 
limits of the SWIRE survey. Clearly not all SMGs are so 
weak in the mid-IR, as the samples identified here are clearly 
identifiable in the SWIRE data. Deep Spitzer IRAC and 
MIPS surveys in fields such as GOODS and UDS do ap- 
proach these depths and so we expect these datasets to be 
invaluable in providing counterparts for future SCUBA-2 
and Herschel sources in these fields. 

Another major complication in trying to identify opti- 
cal to mid-IR counterparts for sub-mm galaxies is our de- 
pendance on the template SED's to describe accurately the 
relationship between the flux in the different bands. In the 
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Redshift 

Figure 7. Various flux ratios for an Arp220-likc SMG at different 
redshifts. Expected ratios of the flux at 850/im to IRAC 3.6^tm, 
8.0/im and MIPS 24/im are shown as well as a model prediction 
for the 1.4 GHz radio flux (Carilli & Yun 2000). 

radio this is much simplified as the far-IR - radio correla- 
tion is known to hold to high redshifts (Chapman et al. 2005; 
Kovacs et al. 2006; Ibar et al. 2008). While here we have used 
a simple set of templates which are known to crudely satisfy 
a wide range of galaxy types (RR08), it is clear that deriv- 
ing a set of SED's which properly map the properties of the 
sources under investigation will provide better demarcation 
between correct and incorrect associations. However rather 
than this being a failure of the techinique it could actually 
be its most powerful benefit as it allows for a greater flexi- 
bility in the inclusion of prior information about the galaxy 
population than simple techniques such as the p-statistic. 
This will be discussed further in the following section. 

6.4 General Comments on the Applicability of 
Bayesian Priors based Cross-identifications 

It is clear from the work presented above that this approach 
is useful for situations where spatial associations are not 
sufficient, but reasonable priors can be assumed about the 
properties of the object's SEDs. In addition, as this is an 
automated technique which provides a single statistic (the 
Bayes factor) as a measure of the "goodness" of an associa- 
tion it is also very useful in situations where the number of 
sources requiring associations is larger than can be visually 
inspected. A large number of current and upcoming mis- 
sions are within the bounds of these criteria, including; the 
BLAST experiment, SCUBA-2 legacy survey, and Herschel 
extra-galactic legacy surveys (Pilbratt et al. 2001), among 
many others. 

As shown in Section [4] a major failing of the p-statistic, 
and similar, techniques is that they have difficulty dealing 
with cases where the surface density of real counterparts is 
very high. This is often the case for sub-mm sources as they 
tend to be at high redshift and hence the short wavelength 
counterparts are expected to be faint, and therefore numer- 
ous. There is no way to easily modify those techniques to 
deal with this failing, as they are built around the notion 
that true association are those which could not occur by 
chance. 

This is not a difficulty for our approach as we do not 



take into account the surface density of sources, but instead 
use our prior knowledge about SED shapes. However, the 
effectiveness of our technique is highly dependent on the 
implementation of these priors, as can be seen in the results 
of Section [4] & [5] As we wished to simply test the association 
algorithm the templates and priors used in the analysis are 
naive. Encouragingly even with these naive priors we are 
able to achieve similar levels of completeness and reliability 
as the p-statistic in both simulated and real data. 

The real power of the technique is that it can tailored 
to specific applications by including priors that are more 
specific to the population of galaxies under consideration. 
One major failing of our naive set of templates is that they 
are not an orthogonal basis set, hence introducing a bias 
towards some SED shapes. For future applications a "gold 
standard" set of associations (i.e. a representative sample of 
associations which are known to be correct) could be used 
to produce a basis set of templates via Principal Compo- 
nent Analysis (PCA) or similar techniques. Alternatively, if 
a large "gold standard" set can be defined, the technique 
could be modified to do away with templates altogether and 
simply use the "gold standard" set as the model distribution 
to test against, in a similar way to photometric redshift tech- 
nique presented by Wolf et al. (2009). In addition, all other 
prior information about the galaxy population in question 
can be included in a natural way, e.g. predicted luminosity 
and redshift distributions. 

It is clear that our before our approach can be used the 
priors on the SED must be carefully determined, tested, and 
optimised on mock or well known real datasets. This extra 
level of complexity means that existing techniques such as 
the p-statistic or other simple approaches not discussed here, 
such as the likelihood ratio (Sutherland & Saunders 1992), 
may be preferred for applications where the surface density 
of counterparts is low, and the flux distributions similar (i.e. 
the brightest sources match to the brightest counterparts in 
other catalogues). However in more difficult cases such as 
those present here our technique has the potential to be 
make reliable associations that could not be made with a 
simpler approach. 

6.5 Applicability to Future Herschel Surveys 

One of the most exciting applications for our proposed tech- 
nique could be finding short wavelength associations for 
sources detected by the instruments aboard Herschel . In 
particular the SPIRE instrument Griffin et al. (2007) will 
be able to image at 250^tm, 350/im and 500/^m. However 
given that the primary mirror size of Herschel is 3.5m these 
observations will be plagued by the same issues of poor po- 
sitional uncertainty as existing sub-mm facilities. In addi- 
tion the BLAST experiment performed observations in the 
SPIRE bands utilising a smaller balloon-borne 2m primary. 
Thus it is of interest to see how well our technique could 
band merge these datasets with shorter wavelength, in par- 
ticular MIPS 24^m, data. 

For this exercise we construct set of mock catalogues 
from the GaLICS simulations which represent single band 
SPIRE catalogues. In addition we create a mock "deep" 
MIPS 24/im catalogue with which we want to associate our 
SPIRE sources. In this scenario we only consider SPIRE 
bands not only as these will be the most affected by po- 
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sitional errors, but this should also give some indications 
on how effective this approach would be on existing data 
from BLAST. The SPIRE catalogues are flux limited at 
S250 > 5mJy, S350 > 7.5mJy, S500 > 6.5mJy. While these 
fluxes are relatively arbitrary they were chosen to match 
the expected source density at 250/im (~ 2000 per sq. deg.) 
quoted by large Herschel survey programs such as HerMES 
(Oliver et al. in prerjf]). The 350/mi and 500/mi limits are 
subsequently chosen to be equivalent limits for the exposure 
time needed to reach the 250/im depth. For the MIPS 24/im 
catalogue we select all objects with S24 > 100/iJy. These 
criteria result in 4 single band catalogues with 14578 24/im, 
3631 250/im, 1187 350/mi and 500 500/im sources respec- 
tively. Positions for sources in each single band catalogue 
are scattered via Gaussian random noise with a a equal to 
the expected positional uncertainty; 1.5" for MIPS 24/im, 
3.5" for SPIRE 250/im, 4.7" for SPIRE 350/im and 7" for 
SPIRE 500/im. 

While in the previous discussion the alternative hypoth- 
esis for the calculation of B se d was more obvious i.e. that the 
observed object has an SED inconsistent with the sub-mm 
emission, here that hypothesis is not so applicable as it is 
not practical to fit an SED template to the 24/im data alone. 
Given this, and the surprising success of the p-statistic in 
our previous tests, it seems natural to combine the two ap- 
proaches and use the p-statistic as the alternative hypoth- 
esis. This shift is somewhat natural as the p-statistic is de- 
fined as the probability of finding a source of a given flux 
in a given search radius by random chance. Thus on face 
value it is actually the statistic we are looking for to give 
the probability that an association is simply a random su- 
perposition. 

This introduction of the p-statistic is mildly compli- 
cated as it is designed to give the probability of finding a 
single source of a given flux within a given search radius, not 
a collection of multiple sources. To overcome this we calcu- 
late the total probability of finding our collection of sources 
by chance by first calculating the p-statistic for each band, 
given the distance from the best-estimate position, the mea- 
sured flux and a search radius defined as 5x the expected 
positional uncertainty. Then the probabilities for each band 
are multiplied together to give the total probability that the 
association is a random superposition. 

We process these catalogues using our Bayesian associ- 
ation technique with exactly the same approach as before, 
except that now the alternative hypothesis in the calculation 
of H S ed is the total p-statistic for the match, rather than a 
SED fit. The caveat that each association contains a 24/im 
source is introduced for practical reasons. 

Processing these catalogues with the Bayesian associ- 
ation technique results in 3612 matches of which 3424 are 
100% correct. This translates to a completeness& reliabil- 
ity of 95%. Of the 3612 associations made 2477 are made 
on the basis of 250/tm and 24/im data alone, while 1091 
include a 350/im source, and 408 include a 500/im source. 
Thus while the completeness levels for the 250/im sources 
are very high, they drop to 91% for the 350/im sources, and 
80% for the 500/im sources. The reliability also suffers with 
94% of 350/tm associations made correctly, and only 87% of 



500/im associations made correctly. Interestingly while the 
error rate in the 500/im associations is disturbingly high the 
difference in the resulting 500/im flux quoted for the mis- 
match is usually very close to the value found in the correct 
association. In 71% of 500/im mismatches the flux of the in- 
terloping source is within 10% of the correct source, and only 
5 (9%) of interlopers have a flux which is more than 20% off 
the true value. This is a nightmare scenario for our approach 
as these mismatches cannot be distinguished from the cor- 
rect solution via either the SED (which would be almost the 
same), nor the p-statistic. However in practical terms these 
mismatches would be unlikely to have any discernable affect 
on the band merged catalogue nor scientific use of it, the 
500/im sources have the worst positional uncertainty and 
hence contribute almost no information to the best estimate 
position, while the real flux uncertainty of SPIRE 500/im 
sources would be expected to be in the 5-10% range. Thus 
while the associations of 500/im sources appears to have an 
unacceptably low reliability this may prove to have little 
consequence in terms of the scientific usage of catalogues 
produced in this way. 



7 CONCLUSION 

We have presented a new technique for finding associations 
between astronomical sources with large positional uncer- 
tainties. At the heart of our approach is a Bayesian frame- 
work for the association problem which extends that pre- 
sented by Budavari & Szalay (2008). Applications of the 
technique have been shown on both simulated and real sub- 
mm datasets from GaLICS and SCUBA, respectively. For 
simulations of existing ground-based sub-mm datasets the 
performance of our method is found to be comparable with 
the p-statistic, with the key difference being that our method 
is superior at recovering reliable associations for the highest 
redshift sources. 

Using a sample of SCUBA sources in SXDF from Cop- 
pin et al. (2006) with good radio identifications from Ivison 
et al (2007) as a testbed we recover 22 of 33 (67%) radio 
identifications using only the optical to mid-IR data. Using 
a Bayes factor threshold it is possible to construct a cat- 
alogue with reasonable completeness (20/33:64%) but very 
high reliability (95%), successfully demonstrating the power 
of combining SED information and spatial information in 
a Bayesian way. Our technique finds plausible mid-IR as- 
sociations for 4 previously unassociated SHADES SCUBA 
sources in the Subaru-XMM deep field. 

Finally an application of the technique on future Her- 
schel SPIRE data is presented. We conclude that using our 
approach to band merge sources from the 3 SPIRE bands 
and Spitzer MIPS 24/im would result in merged catalogues 
with a completeness and reliability of ~ 90%. 
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APPENDIX A: SUMMARY OF BUDAVARI & 
SZALAY MATCHING TECHNIQUE 

The association technique presented by Budavari & Szalay 
(2008) relies on the calculation of the Bayes factor for each 
combination of sources from different catalogues. If we define 
H as the hypothesis that a set of astronomical positions from 
different catalogues represent the same physical source, and 
K the alternative hypothesis that they come from two or 
more sources than the Bayes factor can be written, after 
applying Bayes theorem as; 

B(H K\D)- P{D \ H) 
B{H > K]D) -P(D[K) 

Budavari & Szalay show that this quantity can be cal- 
culated in an iterative way over a series of catalogues via the 
quantities at and qk, which represent the cumulative sum of 
the weights, and the cumulative sum of the residuals, re- 
spectively. These quantities are calculated via the following 
equations, 

ak = a-k-i + Wk 

i a k-l A 2 

qk = qk-i H w k A k 

ak 

_ -, Wk 7* _ Wk 7* 

Cfe = (Ck-i H Afe)/|cfe_i H Afe| 

ak ak 

Where, 

k 

a k = Wi 
1 

Wi= — 
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Ai = Xi — Si — 1 

And Ci is the unit vector of the best position for the current 
combination of positions, 

k 

Ci = ^ WiXi/\WiXi\ 
i=l 

Finally the logarithm of the Bayes factor, also known 
as the weight of evidence, is found by calculating, 

n 

mB = lniV-i^ ?fe 
Where, 

And the sums and products run over the n catalogues. 



